WordNet-based lexical simplification of a document
نویسندگان
چکیده
We explore algorithms for the automatic generation of a limited-size lexicon from a document, such that the lexicon covers as much as possible of the semantic space of the original document, as specifically as possible. We evaluate six related algorithms that automatically derive limited-size vocabularies from Wikipedia articles, focusing on nouns and verbs. The proposed algorithms combine Personalized Page Rank (Agirre and Soroa, 2009) and principles of information maximization, beginning with a user-supplied document and constructing a customized small vocabulary using WordNet. The bestperforming algorithm relies on word-sense disambiguation with sentence-level context information at the earliest stage of analysis, indicating that this computationally costly task is nonetheless valuable.
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملThe Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension
The present study investigated the effect of different types of text simplification (i.e., reducing the lexical and syntactic complexity of texts) on reading comprehension of English as a Foreign Language learners (EFL). Sixty female intermediate EFL learners from three intact classes in Tabarestan Language Institute in Tehran participated in the study. The intact classes were assigned to three...
متن کاملLexical-semantic SLVM for XML Document Classification
Structured link vector model (SLVM) and its improved version depend on statistical term measures to implement XML document representation. As a result, they ignore the lexical semantics of terms and its mutual information, leading to text classification errors. This paper proposed a XML document representation method, WordNet-based lexical-semantic SLVM, to solve the problem. Using WordNet, thi...
متن کاملSemantic Feature Structure Extraction from Documents Based on Extended Lexical Chains
The meaning of a sentence in a document is more easily determined if its constituent words exhibit cohesion with respect to their individual semantics. This paper explores the degree of cohesion among a document's words using lexical chains as a semantic representation of its meaning. Using a combination of diverse types of lexical chains, we develop a text document representation that can be u...
متن کاملIntegrating a Lexical Database and a Training Collection for Text Categorization
Automatic text categorization is a complex and useful task for many natural language processing applications. Recent approaches to text categorization focus more on algorithms than on resources involved in this operation. In contrast to this trend, we present an approach based on the integration of widely available resources as lexical databases and training collections to overcome current limi...
متن کامل